Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

Introduction

TABLE 1.1

Results reported in BinaryConnect [48] and BinaryNet [99].

Method

MNIST

CIFAR-10

BinaryConnect (only binary weights)

1.29±0.08%

9.90%

BinaryNet (binary both weights and activations)

1.40%

10.15%

1.1

Principal Methods

This section will review binary and 1-bit neural networks and highlight their similarities

and diﬀerences.

1.1.1

Early Binary Neural Networks

BinaryConnect [48] was the ﬁrst work presented that tried to restrict weights to +1 or

−1 during propagation but did not binarize the inputs. Binary operations are simple and

readily understandable. One way to binarize CNNs is by using a sign function:

ωb =

+1,

ω ≥0

−1,

otherwise ^,

(1.1)

where ωb is the binarized weight and ω the real-valued weight. A second way is to binarize

scholastically:

ωb =

+1,

with

probability

p = σ(ω)

−1,

with

probability

1 −p

(1.2)

where σ is the “hard sigmoid” function. The training process for these networks is slightly

diﬀerent from full-precision neural networks. The forward propagation utilizes the binarized

weights instead of the full-precision weights, but the backward propagation is the same as

conventional methods. The gradient ^∂C

∂ωb ^{needs to be calculated (}^C^{is the cost function) and}

then combined with the learning rate to update the full-precision weights directly.

BinaryConnect only binarizes the weights, while BinaryNet [99] quantizes both the

weights and activations. BinaryNet also introduces two ways to constrain weights and ac-

tivations to be either +1 or −1, like BinaryConnect. BinaryNet also makes several changes

to adapt to binary activations. The ﬁrst is shift-based Batch Normalization (SBN), which

avoids additional multiplications. The second is shift-based AdaMax instead of the ADAM

learning rule, which also decreases the number of multiplications. The one-third change is

to the operation to the input of the ﬁrst layer. BinaryNet handles continuous-valued inputs

of the ﬁrst layer as ﬁxed-point numbers, with m bits of precision. Training neural networks

with extremely low-bit weights and activations were proposed as QNN [100]. As we are pri-

marily reviewing work on binary networks, the details of QNN are omitted here. The error

rates of these networks on representative datasets are shown in Table 1.1. However, these

two networks perform unsatisfactorily on larger datasets since weights constrained to +1

and −1 cannot be learned eﬀectively. New methods for training [BNNs] and 1-bit networks

need to be raised.

Wang et al. [234] proposed Binarized Deep Neural Networks (BDNNs) for image clas-

siﬁcation tasks, where all the values and operations in the network are binarized. While

BinaryNet deals with CNNs, BDNNs target basic artiﬁcial neural networks consisting of

full-connection layers. Bitwise neural networks [117] also present a completely bitwise net-

work where all participating variables are bipolar binaries.